Time series seasonal analysis is one of the core foundations of time series analysis. Generally, a time series could either have a seasonal component or not. In addition, a series could have more than one seasonal pattern, for example:

In this section we will focus on time series with a single seasonality pattern. In the following examples, we will use the the US monthly consumption of natural gas.

One of the main tools for exploring seasonality is data visualization. Another simplistic but useful is summary statistics tables.

library(tsibble)
library(dplyr)
library(lubridate)
library(plotly)
library(ggplot2)
library(viridis)
library(fabletools)
library(feasts)

Data

First, let’s load the series and reformat it into tsibble object:

naturalgas_path <- paste(rprojroot::find_rstudio_root_file(), "data", "NATURALGAS.csv", sep = "/")

us_gas <- read.csv(naturalgas_path, stringsAsFactors = FALSE) %>%
  setNames(c("date", "y")) %>%
  mutate(date = yearmonth(as.Date(date))) %>%
  as_tsibble(index = "date")


head(us_gas)
## # A tsibble: 6 x 2 [1M]
##       date     y
##      <mth> <dbl>
## 1 2000 Jan 2510.
## 2 2000 Feb 2331.
## 3 2000 Mar 2051.
## 4 2000 Apr 1783.
## 5 2000 May 1633.
## 6 2000 Jun 1513.

Just by eyeballing on the series plot you idenifty the strong seasonal pattern:

us_gas %>% plot_ly(x = ~ as.Date(date),
                   y = ~ y,
                   type = "scatter",
                   mode = "line",
                   name = "US Natural Gas") %>%
  layout(title = "US Natural Gas Consumption (Not Seasonally Adjusted)",
         yaxis = list(title = "Billion Cubic Feet"),
         xaxis = list(title = paste("Source: U.S. Bureau of Transportation Statistics,", "<br>" ,
                                    "Natural Gas Consumption [NATURALGAS]", sep = "")),
         hovermode = "compare")

Decomposing the Series

We can leverage the classical_decomposition function to decompose the series to the trend, seasonal and irregular components:

us_gas %>% 
  model(classical_decomposition(y)) %>%
  components() %>% 
  autoplot()

Summary Statistics

We will start with summary statistics tables by calculating:

We will create time features with the year and month functions from the lubridate package that enables us group by those features:

us_gas <- us_gas %>%
  mutate(year = year(date),
         month = month(date, label = TRUE))

head(us_gas)
## # A tsibble: 6 x 4 [1M]
##       date     y  year month
##      <mth> <dbl> <dbl> <ord>
## 1 2000 Jan 2510.  2000 Jan  
## 2 2000 Feb 2331.  2000 Feb  
## 3 2000 Mar 2051.  2000 Mar  
## 4 2000 Apr 1783.  2000 Apr  
## 5 2000 May 1633.  2000 May  
## 6 2000 Jun 1513.  2000 Jun

The year average could provide us insighs about change in the trend direction. This information can observed also from the decompose function:

us_gas %>% 
  as.data.frame() %>%
  group_by(year) %>%
  summarise(avg = mean(y)) %>%
  plot_ly(x = ~ year,
          y = ~ avg,
          type = "scatter",
          mode = "line")

If a series has a strong trend it may shadow some of the series seasonality. In this case, you can either use short part of the series or detrend the series.

The next summary is by the main frequency units - the month of the year:

us_gas %>% 
  as.data.frame() %>%
  group_by(month) %>%
  summarise(avg = mean(y))
## # A tibble: 12 x 2
##    month   avg
##    <ord> <dbl>
##  1 Jan   2829.
##  2 Feb   2529.
##  3 Mar   2344.
##  4 Apr   1903.
##  5 May   1725.
##  6 Jun   1713.
##  7 Jul   1894.
##  8 Aug   1911.
##  9 Sep   1688.
## 10 Oct   1781.
## 11 Nov   2054.
## 12 Dec   2573.

Box-plot plots

A box plot is a useful method to visualize distibution of the series observations by its frequency units:

us_gas %>% 
  plot_ly(x = ~month, y = ~ y, 
          type = "box", 
          color = ~month, 
          colors = "Dark2",
          boxpoints = "all", 
          jitter = 0.3,
          pointpos = -1.8)

While we can see a clear seasonal pattern, the distribution of each month is fairly wide. This is mainly due to the year over year growth (or the series trend). We can reduce the growth impact by detrend the series and replot the seasonal box-plot. We will use the classical_decomposition and model functions from the feast and fabletools packages, respectivlly:

us_gas_comp <- us_gas %>% 
  model(classical_decomposition(y)) %>%
  components() 

us_gas <- us_gas %>% 
  left_join(us_gas_comp %>% select(date, trend)) %>%
  mutate(detrend = y - trend)

Let’s replot the series monthly distribution, this time using the detrended series:

us_gas %>% 
  plot_ly(x = ~month, y = ~ detrend, 
          type = "box", 
          color = ~month, 
          colors = "Dark2",
          boxpoints = "all", 
          jitter = 0.3,
          pointpos = -1.8)

After reducing the trend impact, the monthly distribution, as expected, is narrowing down, and we can see a clear seasonal pattern.

YoY Plot

p <- plot_ly() 

years <- unique(us_gas$year)

colors <- viridis::viridis(n = length(years))

for(i in seq_along(years)){
  y <- NULL
  
  y <- us_gas %>% 
    as.data.frame() %>%
    filter(year == years[i])
  
  p <- p %>% add_lines(x = y$month,
                       y = y$y,
                       line = list(color = colors[i]),
                       name = years[i])
  
}

p
p <- plot_ly()

for(i in unique(us_gas$month)){
  m <- NULL
  
  m <- us_gas %>%
    as.data.frame() %>%
    filter(month == i)
  
  p <- p %>% 
    add_lines(x = m$year,
              y = m$y,
              name = i)
}

p

Heatmap

us_gas_df <- us_gas %>%
  as.data.frame() %>%
  select(year, month, y) %>%
  tidyr::pivot_wider(names_from = year,
                     values_from = y)


plot_ly(x = names(us_gas_df)[-1],
        y = us_gas_df$month,
        z = as.matrix(us_gas_df[, -1]),
        colors = "Reds",
        xgap = 3,
        ygap = 3,
        type = "heatmap")
ggplot(us_gas, aes(x = y)) +
geom_density(aes(fill = month)) +
ggtitle("US Gas - Kernel Density Estimates by Month") + facet_grid(rows = vars(as.factor(month)))

ggplot(us_gas, aes(x = detrend)) +
geom_density(aes(fill = month)) +
ggtitle("US Gas - Kernel Density Estimates by Month") + facet_grid(rows = vars(as.factor(month)))

us_gas %>%
gg_season(y, labels = "both") +
  ylab("Billion Cubic Feet") +
  xlab("Month") +
  ggtitle("US Natural Gas Consumption Seasonal Plot")

us_gas %>%
  gg_subseries(y) +
    ylab("Billion Cubic Feet") +
    xlab("Year") +
    ggtitle("US Natural Gas Consumption by Month")